Validity and repeatability of cardiopulmonary exercise testing in interstitial lung disease

Background Cardiopulmonary exercise testing (CPET), and its primary outcome of peak oxygen uptake (VO2peak), are acknowledged as biomarkers in the diagnostic and prognostic management of interstitial lung disease (ILD). However, the validity and repeatability of CPET in those with ILD has yet to be fully characterised, and this study fills this evidence gap. Methods Twenty-six people with ILD were recruited, and 21 successfully completed three CPETs. Of these, 17 completed two valid CPETs within a 3-month window, and 11 completed two valid CPETs within a 6-month window. Technical standards from the European Respiratory Society established validity, and repeatability was determined using mean change, intraclass correlation coefficient and typical error. Results Every participant (100%) who successfully exercised to volitional exhaustion produced a maximal, and therefore valid, CPET. Approximately 20% of participants presented with a plateau in VO2, the primary criteria for establishing a maximal effort. The majority of participants otherwise presented with secondary criteria of respiratory exchange ratios in excess of 1.05, and maximal heart rates in excess of their predicted values. Repeatability analyses identified that the typical error (expressed as percent of coefficient of variation) was 20% over 3-months in those reaching volitional exhaustion. Conclusion This work has, for the first time, fully characterised how patients with ILD respond to CPET in terms of primary and secondary verification criteria, and generated novel repeatability data that will prove useful in the assessment of disease progression, and future evaluation of therapeutic regimens where VO2peak is used as an outcome measure.


Introduction
Interstitial lung disease (ILD) is the collective term for a series of pulmonary disorders characterised by inflammation, interstitial and alveolar damage, and irreversible declines in lung function [1]. Presently, ILD affects approximately 2 million people [2] and results in approximately 120,000 deaths globally [3]. Traditionally, resting measures of pulmonary function, including forced vital capacity (FVC) and the diffusion capacity of carbon monoxide (DL CO ), have been utilised to monitor disease progression and evaluate the efficacy of treatments. Both variables are predictive of mortality [4] and provide greater predictive power for survival over 6 months than histopathological factors alone [5]. However, these are not the sole factors predictive of mortality.
Cardiopulmonary exercise testing (CPET) is a dynamic diagnostic and prognostic test that simultaneously stresses multiple organ systems in order to identify causes of exercise intolerance, and obtain functionally useful biomarkers [6]. Lower values for peak oxygen uptake (VO 2peak ), the primary outcome from cardiopulmonary exercise testing (CPET), are also associated with increased risk of mortality and need for transplantation [7][8][9][10], enhancing the predictive power of static pulmonary function testing [8], whilst also maintaining high independent predictive power when these factors are controlled for [7]. Ventilatory efficiency, and exercise induced hypoxemia are also indicative of poorer prognosis [11], thus highlighting the importance of more functionally derived data available from CPET as independent and dynamic prognostic outcome measures in addition to traditional, static, pulmonary variables.
The utility and validity of CPET in a range of pulmonary disease has been described previously [12], and within the key requirements of exercise protocols eliciting VO 2peak is confidence that a 'maximal' value has been achieved and that sub-maximal values are not mistakenly accepted [12]. However, of previous studies to utilise CPET in ILD, it is unclear as to whether maximal exercise has actually been achieved, as prior studies in ILD have either not reported how maximal exercise is classified [7][8][9], or only use limited criteria to establish a 'maximal' value [10]. Equally, there is a lack of data on the repeatability of CPET in ILD, with a need to understand this to be able to accurately interpret significant and clinically meaningful changes in function, to inform and evaluate treatment options and appropriately assess disease progression [13].
Therefore, this study sought to characterise CPET responses in patients with ILD, focusing on the validity and repeatability of the test, with particular emphasis with regards to VO 2peak , to further the evidence for the using this parameter as an independent physiological marker of disease progression in ILD.

Study design, population and ethics
This analysis forms part of a wider study (PETFIB: Exploring the potential of Cardio-Pulmonary Exercise Testing as a biomarker in patients diagnosed with FIBrosing Lung Disease), whereby the clinical feasibility and patient acceptability of CPET, and initial results on participant characteristics, have been previously reported [14]. This study recruited 26 people with ILD [19 male] via convenience sampling, of differing diagnoses, and prescribed differing medications as per Table 1. All participants attended the research facility on three occasions, over a 20-month period from August 2017 to May 2019, with a period of 3 months (0.2-0.3 years) separating each visit where possible.
Ethics approval for this study was granted by the Health Research Authority (IRAS 220189) following review by the South West (Frenchay) Research Ethics Committee (17/SW/0059). All participants provided written and informed consent upon recruitment to the study.

Physiological measures
Participant's stature and body mass were assessed using standard methods, with body mass index (BMI) subsequently calculated. Body fat percentage was assessed using air displacement plethysmography (BodPod; COS-MED, Rome, Italy), with subsequent values for fat mass and fat-free mass (FFM) calculated.
Retrospective measures of pulmonary function were obtained from medical records, whereby the date closest to the participants first CPET was utilised. Measures included forced expiratory volume in one second (FEV 1 ), FVC and DL CO , expressed as absolute values and as a percent of predicted value for age, sex, and stature [15,16]. Furthermore, GAP scores, incorporating a composite of gender, age and physiology were also calculated for each participant. Scores range from 0 to 8, whereby an increased score is indicative of a greater risk profile for early mortality [17]. Physical activity status was subjectively assessed using the General Practice Physical Activity Questionnaire [18].

Cardiopulmonary exercise testing
Participants underwent a CPET on an electronically braked cycle ergometer (Lode Excalibur; Lode, Groningen, the Netherlands), whereby the protocol incorporated an initial warm-up at 0 W for three minutes before an incremental ramp phase increased resistance by 10 W min −1 . Participants were instructed to maintain a self-selected cadence between 60 and 80 revolutions per minute (rpm) until volitional exhaustion, defined as a decrease in cadence < 10 rpm for 5 consecutive seconds despite verbal encouragement from research staff. Upon exhaustion, the resistance was removed, and participants returned to pedalling at 0 W for a further three minutes to cool down. This protocol has been detailed previously [14].
Throughout the CPET, measures of pulmonary gas exchange were recorded using a metabolic cart (Medgraphics Ultima; Medical Graphics UK Ltd., Gloucester, UK), calibrated for volume and gas concentrations prior to each test. Data was measured breath-by-breath and analysed in 10 s averages, with VO 2peak and presence of a plateau in VO 2 being determined using methods described previously [19]. Briefly, a linear regression was plotted over the 'linear' portion of the exercise test, with data from the first and last two minutes prior to exhaustion (or clinical termination) excluded. The VO 2 from this linear portion was then extrapolated over the remainder of the test, and residuals from final 60-s isolated and examined against the extrapolated portion. A negative residual indicated a deceleration in VO 2 against power output and was defined as a plateau when the magnitude of residuals was ≥ 5% of projected VO 2 (Fig. 1a). Either a positive or negative residual < 5% of projected VO 2 indicated a linear response (Fig. 1b). Finally, a positive residual ≥ 5% indicated an acceleration in VO 2 against power output (Fig. 1c).
Normative values of Jones et al. [20], as suggested by the European Respiratory Society (ERS) [12], were utilised to present VO 2peak and peak work rate (WR peak ) as a percent of predicted. Determination of the gas exchange threshold (GET) was undertaken using the V-slope method as previously described [21], and verified using ventilatory equivalents for oxygen (V E /VO 2 ) and carbon dioxide (V E /VCO 2 ).
Subjective ratings of perceived exertion (RPE) and dyspnoea (RPD) were recorded at baseline, throughout Table 1 Baseline anthropometric, pulmonary, and clinical data in study participants All continuous variables reported as mean ± standard deviation. Categorical data presented as whole numbers *Physical Activity scaled from 1 to 4 (1, inactive; 2, moderately inactive; 3, moderately active; 4; active) BMI body mass index; IPF idiopathic pulmonary fibrosis; UIP usual interstitial pneumonia; CHP chronic hypersensitivity pneumonitis; FEV 1 forced expiratory volume in one second; FVC forced vital capacity; DL CO diffusion capacity for carbon monoxide; GAP gender-age-physiology score; CPET cardiopulmonary exercise test a = n-1 b = n-2 the CPET, and at test termination, on validated scales of 6-20 and 0-10 respectively [22]. Participants also wore a 12-lead ECG (Welch Allyn CardioPerfect; Hillrom, Chicago, USA) and pulse oximeter (Choice MMed MD300C2; ChoiceMMed, Dusseldorf, Germany), to monitor cardiac changes and peripheral capillary oxygen saturation (SpO 2 ) respectively. All CPETs were supervised by an exercise physiologist and medical doctor, and the CPET was terminated if either ECG (e.g., arrhythmia) or SpO 2 responses warranted early cessation for patient safety. In the first round of CPETs, SpO 2 limit was set at < 88%, and extended to < 80% in the second and third CPETs as hypoxemia was shown to be well tolerated in the first CPET.

Determination of validity
CPET was determined to be a 'maximal' effort (and therefore valid) if it satisfied at least one of the criteria set forward by a recently published technical standards document from the ERS [12]. With relation to the current study design (cycle ergometry with ramp protocol) and available measures (pulmonary gas exchange, work rate, retrospective spirometry and cardiac function), these criteria included a primary criterion of a plateau in VO 2 (as previously described), or one of numerous secondary criteria, including: (1) achieving predicted VO 2peak , using aforementioned normative equations of Jones et al. [20]; (2) achieving predicted WR peak , using normative equations of Jones et al. [20]; (3) achieving predicted maximal heart rate (HR max ; calculated as 220-Age); (4) peak ventilation (V E ) reaching, or exceeding, 85% of estimated maximal voluntary ventilation (MVV; calculated as FEV 1 × 40); and (5) respiratory exchange ratio (RER) ≥ 1.05.

Determination of repeatability
For participants who performed at least two valid CPETs within an either a 3-month or 6-month period (an ecologically valid time frame reflecting frequency of clinical visits), differences between CPETs were established using paired samples t-tests. Reproducibility was established using an existing spreadsheet [23], with calculation of (a) changes in the mean, (b) Pearson's correlation coefficients, (c) intraclass correlation coefficients (ICC), d) absolute typical error (TE), and (e) TE expressed as a percentage of the coefficient of variation (TE CV% ). Both TE and TE CV% were calculated with 95% confidence limits and a smallest worthwhile effect size of 0.2. This approach has previously been utilised for determining repeatability of exercise based parameters in respiratory disease [24]. Furthermore, Bland-Altman analyses [25] identified the mean bias and limits of agreement (LoA) between repeated measures of VO 2 .

Statistical analyses
Assessment of validity and repeatability have been discussed previously within the methodology and therefore, given these aforementioned approaches, no formal power calculation was undertaken. With regards to correlation coefficients, magnitudes were described as small (0.1 < 0.3), medium (0.3 < 0.5) and large (≥ 0.5) as per existing thresholds [26]. For all analyses, statistical significance was set at p = 0.05. Linear response (70-year old male with idiopathic pulmonary fibrosis); C: Acceleration of VO 2 against power (58-year old male with chronic hypersensitive pneumonitis). For all cases, the extrapolated regression line is fitted from 120 s, through to volitional exhaustion. VO 2 : oxygen uptake

Participant characteristics
Twenty-six participants were recruited, although clinical contraindications resulted in n = 2 being excluded from baseline CPETs, as described in Fig. 2, and described previously [14].
Therefore, n = 24 undertook at least one CPET. Further exclusions during the course of the study period resulted in n = 21 completing all three CPETs. Descriptive participant characteristics of the n = 26 recruited, n = 24 to undertake at least one CPET, and n = 21 to perform all three CPETs are listed in Table 1. A total of 67/78 prospective CPETs were completed during the course of this study.
A total of n = 17 participants successfully performed at least two CPETs within a 3-month period (14 ± 2 weeks, 11-16 weeks), with this either being between the first and second CPET, or between the second and third. If participants had a 3-month gap between their first and second, as well as second and third CPETs, data between the first and second CPETs was carried forward for repeatability analyses. Within this sample of n = 17, a further n = 12 participants successfully performed two CPETs within a 6-month period (27 ± 1 weeks, 25-29 weeks).
Factors that prevented all participants completing CPETs within the prescribed 3-month (and subsequent 6-month) periods included personal availability, malfunctioning equipment, laboratory availability, participants forgetting to attend scheduled research visits, and staff availability (affected by general clinical rotas and hospital winter pressures).

Changes in exercise and pulmonary function
Exercise based outcomes for the n = 21 to complete all three CPETs are listed in Table 2, whereby WR peak ranged from 20 to 166 W in this group during the study, and VO 2peak ranged from 0.34 to 2.40 L min −1 . However, due to the unplanned variances in individual testing timelines as mentioned above, each participant does have differing time frames between each CPET. Therefore, no formal analyses could be undertaken on these exercise-based parameters, and the data displayed in Table 2 are for descriptive purposes only, whereas formal repeatability analyses are presented further below. Furthermore, due to the retrospective nature of obtaining pulmonary function data, a number of data points could not be retrieved from participant medical records, leading to incomplete pulmonary function data as seen in Table 1. There was also wide variability in the time difference between pulmonary function tests and CPETs. The smallest difference was zero days, whereby a participant had undertaken pulmonary function testing on the same day as a CPET. The mean difference was -32 ± 96 days (− 0.09 ± 0.26 years), indicating that pulmonary function tests, were as an average, undertaken 1 month prior to each CPET. However, the total range was from − 252 to 317 days (− 0.69 to 0.87 years). Therefore, given this disparity in timelines, and the fact that pulmonary function is not a primary outcome variable within this study, these data are only utilised as a descriptive variable in Table 1, and no further analyses are undertaken with regards to FEV 1 , FVC or DL CO . Further to these changes in pulmonary function, no mean change was identified in GAP score, or physical activity status.
Further to the occurrence of a number of plateaus in VO 2 , a number of secondary criteria were achieved by participants, with a full breakdown provided in Fig. 3B1, B2. Across all 67 CPETs, multiple secondary criteria were obtained: reaching predicted VO 2peak (n = 8), reaching predicted WR peak (n = 4), reaching predicted HR max (n = 29), reaching ≥ 85% MVV (n = 12) and reaching RER ≥ 1.05 (n = 60). At least one criteria (primary or secondary) was fulfilled for n = 24 CPETs, two criteria in n = 22, three criteria in n = 11, four criteria in n = 4 and five criteria in n = 2. No CPET fulfilled every primary and secondary criterion. Of the n = 24 CPETs terminated for clinical reasons (e.g., desaturation, cardiac contraindications), a total of n = 20 were verified as being maximal by reaching a number of primary or secondary criteria (1 criteria, n = 9; 2 criteria, n = 7; 3 criteria, n = 2; 4 criteria, n = 1). Of the remaining n = 4 that failed to present with any verification criteria, these were terminated by clinicians for desaturation (n = 3) and right bundle branch block (n = 1).
Therefore, these data indicate that 94% of performed CPETs were deemed valid. When solely considering participants who exercised to volitional exhaustion, this figure increases to 100% (42/42).

Table 2
Changes in anthropometric and exercise responses at each study visit for n = 21 who completed all three study visits All variables reported as mean ± standard deviation BMI body mass index; CPET cardiopulmonary exercise test; DL CO diffusion capacity for carbon monoxide; FFM fat free mass; FVC forced vital capacity; GET gas exchange threshold; VO 2peak peak oxygen uptake; WR peak peak work rate a n = 19 due to lack of body composition data, arising from of equipment malfunction b n = 20 c n = 18 d n = 15 e n = 19, all due to missing spirometry data from patient records f n = 18 for GET due to non-detection of threshold g n = 16 for GET due to non-detection of threshold

Repeatability of cardiopulmonary exercise testing
Of the n = 17 participants who successfully performed two CPETs within a 3-month period, all tests were deemed to be valid, and therefore repeatability data is determined from n = 17. Of the n = 12 participants who completed two CPETs within a 3-month period, one test was deemed to be invalid, and therefore repeatability data is determined from n = 11. Statistically significant differences were seen between CPETs for all parameters of VO 2 over a 3-month period, and most over a 6-month period, as shown in Table 3, with individual changes visualised in Fig. 4. Data in Table 3 also shows the mostly large correlation coefficients, and typical error associated with the repeatability of outcomes from each test, with this ranging from 12.7 to 25.5% over 3 months, and from 15.7 to 33.9% over 6 months, dependent on the variable being assessed.
When only considering participants who reached volitional exhaustion in both analysed pairs of CPETs, statistically significant changes in VO 2peak are observed over a 3-month period, but not a 6 month period, although the latter is only representative of n = 5 participants (Table 4). Furthermore, the typical error when expressed as a coefficient of variation is lower in participants who reached volitional exhaustion (20% for absolute VO 2peak , Table 4) than for the group which included all participants to produce a valid test (25% error for absolute VO 2peak , Table 3).
The mean bias in absolute VO 2peak for the CPETs performed 3-and 6-months apart was − 0.21 L min −1 each, although the subsequent standard deviations and LoA differed, as shown in Fig. 5A, B. Furthermore, for participants who reached volitional exhaustion, this mean bias remained − 0.21 L min −1 at 3 months (Fig. 5C), with a similar limit of agreement in those to reach volitional exhaustion at 6 months (Fig. 5D).

Discussion
This study, for the first time, has fully characterised the validity of CPET, and repeatability of associated outcomes, in a cohort of patients with ILD. This work has . CPET: cardiopulmonary exercise test; HR max : maximal heart rate; RER: respiratory exchange ratio; V E /MVV: minute ventilation/maximal voluntary ventilation; VO 2 : oxygen uptake; VO 2peak : peak oxygen uptake; WR peak : peak work rate Table 3 Changes in exercise responses in participants who successfully performed two valid cardiopulmonary exercise tests within a 3 month period (n = 17) and six month period (n = 11) CPET cardiopulmonary exercise test; VO 2 peak peak oxygen uptake; FFM fat free mass; WRpeak peak work rate; GET gas exchange threshold; ICC intraclass correlation coefficient; TE typical error; TECV% TE expressed as percent of coefficient of variation; 95% CL 95% confidence limit a n = 17 due to lack of body composition data (equipment malfunction) b n = 13 due to non-detection of gas exchange threshold c n = 10 due to lack of body composition data (equipment malfunction) d n = 7 due to non-detection of gas exchange threshold shown that CPET is a valid tool, whereby all participants to reach volitional exhaustion during CPET provide a valid test; and novel data has been generated surrounding the repeatability and mean bias of exercise-based outcomes over a 3-and 6-month period, with particular reference to VO 2peak .

Validity of cardiopulmonary exercise testing
In the first analyses of this study, focusing on the presence of a valid CPET, it was identified that 100% of participants to reach a volitional exhaustion produced a valid test, and 94% of all tests were deemed valid, even if the participant did not reach volitional exhaustion; a highly encouraging statistic. When this is combined with an expressed preference for CPET above and beyond traditional, static, pulmonary function testing [14], this highlights the ability of CPET to be integrated into respiratory services as an additional biomarker for diagnostic, prognostic, and rehabilitative reasons for an older patient group. Traditionally, exercise studies have relied on the occurrence of a plateau in VO 2 to determine a maximal, and therefore valid, effort [12,27]. However, a plateau in VO 2 does not always occur during incremental exercise, having been consistently evidenced in adults [28], children [29] and those with chronic respiratory disease [30,31]. This is corroborated by the present study, whereby only ~ 20% of exercise tests exhibited a plateau in VO 2 . Reliance on this physiological artefact as the sole indicator of VO 2max , and thus a maximal and valid test, is unwise as it can result in dismissal of perfectly valid and clinically useful data. Therefore, use of secondary criteria, such reaching predicted values for VO 2peak , WR peak , HR max , maximal voluntary ventilation and the respiratory exchange ratio are also used to determine whether a maximal test has been achieved [12].
Within the present study, the majority of participants presented with an RER ≥ 1.05 and a HR max exceeding their predicted value. In contrast, very few exceeded their predicted values for VO 2peak and WR peak , with this discrepancy likely due to the fact that predicted values for VO 2peak and WR peak are generated from healthy populations and thus patient populations will simply not reach these values because of their disease status. This does not discount using VO 2peak and WR peak values as secondary criteria, but the sole reliance upon these criteria is also not recommended, and therefore the entre CPET profile must be considered to evaluate whether a maximal effort has been reached. Further secondary criteria are available, such as changes in inspiratory capacity and blood lactate [12], and whilst these could not be assessed in the present study due to limitations of the study design, they provide a wider profile of physiological thresholds with which to determine a maximal effort.
Furthermore, it should be noted that the present study makes use of the most recent technical document for exercise testing in respiratory disease [12], whereas previous guidelines from over 15 years ago [27]-which are not wholly specific to those with chronic lung disease-utilise a slightly differing set of criteria to determine a maximal effort. The most notable differences include the presence of subjective markers of perceived effort, which are excluded from the ERS document [12]; as well as different critical threshold value for the RER. The recent ERS technical standards suggest a value of 1.05 for maximal exercise [12], whereas older from the American Thoracic Society (ATS) guidelines suggest a more conservative value of 1.15 [27]. However, within the present study, had a value of 1.15 been adopted, then only an additional four CPETs (two of which were terminated by clinicians due to desaturation) would be deemed invalid for failing to satisfy any verification criteria, thus resulting in 88% of all CPETs, and 95% of CPETs to reach  Table 4 Changes in exercise responses in participants who successfully performed two valid cardiopulmonary exercise tests, whilst reaching volitional exhaustion, within a 3 month period (n = 10) and six month period (n

= 5)
CPET cardiopulmonary exercise test; VO 2peak peak oxygen uptake; FFM fat free mass; WR peak peak work rate; GET gas exchange threshold; ICC intraclass correlation coefficient; TE typical error; TE CV% TE expressed as percent of coefficient of variation; 95% CL 95% confidence limit a n = 9 due to lack of body composition data (equipment malfunction) b n = 9 due to non-detection of gas exchange threshold c n = 3 due to non-detection of gas exchange threshold volitional exhaustion, being deemed valid. Therefore, the authors believe such a change in a singular secondary verification criterion does not detract from the overall validity seen for CPET in ILD. Moreover, subjective ratings of perceived effort and dyspnoea were collated within this study, but are not formally reported as this was only undertaken to safely monitor individual changes throughout exercise, and were recorded on a 6-20 scale, and not a 0-10 scale [22] as suggested by the ATS. However, understanding the repeatability and continued clinical change in perceptual responses to exercise will be of use to both clinicians and patients alike, and should be considered for future research and incorporation into future practice.
To circumvent the reliance on secondary criteria, which are not always robust in determining whether a maximal effort has been reached [29,32], the use of supramaximal verification testing has been proposed, whereby an additional exercise challenge is presented to participants, at a work rate that is typically in excess of that achieved during a ramp incremental test [33]. This has been shown to be effective in healthy [28] and diseased adults [31], although the feasibility and acceptability of this additional physical work in people with ILD remains unknown, and the development of an optimal protocol for performing CPET in this population should be undertaken [12].
Furthermore, whilst the majority of participants reached volitional exhaustion during their CPET, 36% of participants tests were terminated early and subsequently 4% of tests were deemed invalid due to a failure to produce sufficient maximal values. Therefore, to account for these situations whereby a clinical termination of a CPET may be required, submaximal exercise parameters should be investigated in those with ILD. Previous research in other clinical populations such as CF, heart failure and COPD, has focused on parameters of oxygen uptake efficiency (VO 2 /V E ) and ventilatory drive (V E /VCO 2 ) [34-37]-both of which are also important in ILD [11]. The advantage of such parameters is that these do not require a full CPET to be completed, nor volitional exhaustion to be reached, which is in contrast to parameters such as the GET. Whilst the GET is considered sub-maximal, it is normally characterised as a percentage of VO 2peak , a process that requires participants to reach volitional exhaustion upon which to anchor the threshold, and therefore defeats the purpose of attempting to source a sub-maximal parameter. Therefore, exploration of truly submaximal parameters that are not dependent on maximal anchors, such as the oxygen uptake efficiency plateau [34, Fig. 5 Bland Altman plots displaying mean bias and limits of agreement for absolute VO 2peak obtained from cardiopulmonary exercise tests. A: CPETs performed 3 months apart for n = 17 participants. B: CPETs performed 6 months apart for n = 11 participants. C: CPETs performed 3 months apart for n = 10 participants, who reached volitional exhaustion only. D: CPETs performed 6 months apart for n = 5 participants, who reached volitional exhaustion only. In each instance, difference (y-axis) presents data from CPET 2-CPET 1 (i.e., a value above zero indicates CPET 2 was higher than CPET 1 and therefore an increase in function has occurred). CPET: cardiopulmonary exercise test 38], is warranted in ILD to ascertain validity and prognostic utility.

Repeatability of cardiopulmonary exercise testing
Within the second analysis of this study, the repeatability of multiple outcome variables associated with CPET were established, notably that of VO 2peak . Given the importance of VO 2peak as a biomarker in ILD [7,8], it is critical to be able to identify error and variation in such measures, to allow successful inferences regarding physiological decline and efficacy of therapeutic regimens to be made.
As there is a paucity of data on describing natural variation in pulmonary function [39], this study has provided valuable data in identifying variation in the equally valuable marker of VO 2peak . This study subsequently identified that absolute VO 2peak presented with a typical error, when expressed as a coefficient of variation, of 20% over a 3-month period in participants to reach volitional exhaustion. Whilst this error did reduce to ~ 10% over 6 months in those to reach volitional exhaustion, this analysis was only in a sample of n = 5 and must therefore be treated with caution.
Previous research has predominantly focused on younger, and healthy, individuals to identify repeatability of exercise testing [40,41], with a repeatability of CPET being established in some clinical groups, such as those cystic fibrosis [24,42] and pulmonary arterial hypertension [43], identifying lower rates of error in VO 2peak than the present study. However, these studies were far shorter in length, ranging from 48-h repeatability to 6 weeks and undertaken in notably younger populations, and conditions which present with notably different pathophysiology, co-morbidities, risk factors and treatment profiles to ILD. The repeatability of CPET in 'restrictive lung disease' has only been evaluated once previously, observing a variation of ~ 5% in VO 2 at peak exercise over a 28-day period [44]. However, this study from Marciniuk et al. [44] was published nearly 30 years ago, and was undertaken on only six patients, three with idiopathic pulmonary fibrosis (as per the current study), two with sarcoidosis and a single case of systemic sclerosis. Therefore, the results of this prior study should be interpreted with extreme caution, and even ignored when considering advances made in ILD management in the intervening decades. In utilising both a 3-and 6-month period of observation, we have utilised a time frame that is less burdensome than a smaller resolution (e.g., 1 week) that would require more frequent testing, whilst aligning this repeatability window with the schedule of routine clinical appointments that people with ILD have with their clinical support teams, dependent upon disease severity and trajectory [45]. As the research and clinical team were not manipulating any further treatment during this study period, all observed change would be due to disease progression and therefore the authors believe that the data presented in this study are more ecologically valid than the prior data over a 28-day time frame [44].
Further to reporting the repeatability of VO 2peak in this population, it must also be acknowledged that the disease trajectory of ILD is markedly different to other chronic respiratory disease, having a median survival of only 2-3 years from diagnosis [46], unlike conditions such as chronic obstructive pulmonary disease or cystic fibrosis, whereby median survival times are ~ 10 and ~ 40 years from diagnosis and birth respectively [47,48]. This therefore calls into question whether the observed variances in VO 2peak are representative of 'normal' error that would ordinarily be observed between tests, and what is a genuine decline in physiological function. A number of participants within the present study had decrements in VO 2peak of > 0.5 L min −1 over a 3-month period, and therefore distinguishing between genuine variation and disease-driven change is of importance in ILD management, and will require corroboration of the current results to establish the true repeatability of VO 2peak . Moreover, further studies to assess repeatability over alternative timeframes (e.g., 1 week, 1 month) are warranted; aligning with the time course of potential health changes in this population.

Study considerations
There are a number of strengths to this study. Primarily, a robust protocol was utilised to elicit VO 2peak in participants though gold-standard CPET; and a thorough analysis of contemporary, internationally developed, technical standards were used to establish validity of CPET in this patient group. Moreover, choice of cycle ergometry was optimal in this group, as is not only acceptable to patients with ILD [14], but is also likely less affected by dynamic stability. As people with ILD demonstrate impaired stability (e.g. stride length, contact time) during treadmill testing [49], cycling is a preferable modality as outcomes, such as VO 2peak , will reflect a genuine cardiopulmonary function, instead of an ability to simply balance during the test.
In addition to the choice of methodology to elicit VO 2peak , the mathematical calculation of VO 2 plateaus, use of multiple techniques to assess repeatability (as opposed to relying on a single correlation for example), and determination of such repeatability over multiple, ecologically valid, time frames (3-and 6-months) in an under-reported group further add to the strengths and novelty of this investigation.
In respect to limitations, it is acknowledged that the unexpected variances in individual testing timelines (due